Cyrillic Mongolian Named Entity Recognition with Rich Features
نویسندگان
چکیده
In this paper, we first create a Cyrillic Mongolian named entity manually annotated corpus. The annotation types contain person names, location names, organization names and other proper names. Then, we use Condition Random Field as classifier and design few categories features of Mongolian, including orthographic feature, morphological feature, gazetteer feature, syllable feature, word clusters feature etc. Experimental results show that all the proposed features improve the overall system performance and stem features improve the most among them. Finally, with a combination of all the features our model obtains the optimal performance.
منابع مشابه
Mongolian Named Entity Recognition System with Rich Features
In this paper, we first build a manually annotated named entity corpus of Mongolian. Then, we propose three morphological processing methods and study comprehensive features, including syllable features, lexical features, context features, morphological features and semantic features in Mongolian named entity recognition. Moreover, we also evaluate the influence of word cluster features on the ...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملLanguage Model for Cyrillic Mongolian to Traditional Mongolian Conversion
Traditional Mongolian and Cyrillic Mongolian are both Mongolian languages that are respectively used in china and Mongolia. With similar oral pronunciation, their writing forms are totally different. A large part of Cyrillic Mongolian words have more than one corresponds in Traditional Mongolian. This makes the conversion from Cyrillic Mongolian to Traditional Mongolian a hard problem. To overc...
متن کاملسیستم شناسایی و طبقهبندی موجودیتهای اسمی در متون زبان فارسی بر پایه شبکه عصبی
Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...
متن کامل